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Abstract 

Background: Targeted sampling can capture the characteristics of more vulnerable sectors of a population, but 
may bias the picture of population level disease risk. When sampling network data, an incomplete description of 
the population may arise leading to biased estimates of between-host connectivity. Avian influenza (Al) control 
planning in Great Britain (GB) provides one example where network data for the poultry industry (the Poultry 
Network Database or PND), targeted large premises and is consequently demographically biased. Exposing the 
effect of such biases on the geographical distribution of network properties could help target future poultry 
network data collection exercises. These data will be important for informing the control of potential future disease 
outbreaks. 

Results: The PND was used to compute between-farm association frequencies, assuming that farms sharing the 
same slaughterhouse or catching company, or through integration, are potentially epidemiological^ linked. The 
fitted statistical models were extrapolated to the Great Britain Poultry Register (GBPR); this dataset is more 
representative of the poultry industry but lacks network information. This comparison showed how systematic 
biases in the demographic characterisation of a network, resulting from targeted sampling procedures, can bias the 
derived picture of between-host connectivity within the network. 

Conclusions: With particular reference to the predictive modeling of Al in GB, we find significantly different 
connectivity patterns across GB when network estimates incorporate the more demographically representative 
information provided by the GBPR; this has not been accounted for by previous epidemiological analyses. We 
recommend ranking geographical regions, based on relative confidence in extrapolated estimates, for prioritising 
further data collection. Evaluating whether and how the between-farm association frequencies impact on the risk 
of between-farm transmission will be the focus of future work. 



Background 

Targeted collation of contact data typically only represent 
a small subset of the true population, and if these data are 
biased this may lead to misinterpretation of recorded con- 
tact structures [1-3]. Consequently, heterogeneities in 
population contact structure can be poorly characterised. 
The importance of such contact heterogeneities for infec- 
tious disease transmission have been highlighted through 
the development of social network models in humans [4] 



* Correspondence: s.nickbakhsh@vet.gla.ac.uk 

1 Boyd Orr Centre for Population and Ecosystem Health, Institute for 

Biodiversity, Animal Health and Comparative Medicine, University of 

Glasgow, Bearsden Road, Scotland, G61 1QH, UK 

Full list of author information is available at the end of the article 

(3 BioMed Central 



and movement network models in livestock [5-10]. In 
Great Britain (GB), the application of network analysis to 
livestock movements has been uniquely informed by a 
well-defined temporally explicit Cattle Tracing System 
(CTS) database [11,12]. However, even in this case there is 
some evidence of potential bias in cattle movement pat- 
terns arising through missing or incorrect movement 
records at the level of the type of enterprise [13]. Such sys- 
tematic errors, arising from data collection procedures and 
inaccuracies in reported information, may lead to biases in 
the quantification of network properties. Bias identifica- 
tion is therefore an important step in ensuring model 
validity. 
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Mathematical models of avian influenza (AI) in Great 
Britain (GB) have been largely informed by the Poultry 
Network Database (PND), providing poultry network 
information for a subset of the industry, and the Great 
Britain Poultry Register (GBPR) which provides more 
representative demographic information. Although the 
PND does not reflect temporally explicit movements on- 
to and off-of farms, shared industry associations have been 
used to infer potential contacts between farms and have 
informed stochastic simulation and exploratory models 
[14-16]. For example, all farms that are associated with a 
particular slaughterhouse are assumed potentially epide- 
miologically linked to one another. In the absence of epi- 
demic data, and therefore without the ability to validate 
predictive models for AI control in GB, mathematical 
models are a valuable tool for exploring the connectivity 
of the poultry industry. These epidemiological models 
have investigated the efficacy of current control measures 
for AI in GB and have identified particular scenarios that 
could result in a large outbreak [14-16]. 

The PND was collated in 2006 by the Veterinary 
Laboratories Agency (VLA). This was designed to establish 
farms that share industry associations such as through 
catching companies (CCs), slaughterhouses (SHs) or 
through being part of a larger integrated company (IC). 
From this, an estimate of between-farm association fre- 
quency (i.e. the maximum number of farms a single farm 
may be associated with) can be made at a farm-level, 
which can be used to inform logistical considerations dur- 
ing a disease outbreak prior to the implementation of 
movement restrictions [17]. These between-farm associa- 
tions inferred from the PND have been used as a proxy for 
between-farm "contacts" as they are considered to repre- 
sent potential routes of between-farm spread of infection 
through personnel, shared equipment and vehicles [16]. 

Epidemiological evidence from previous outbreaks of AI 
indicate the role of indirect transmission via fomites, for 
example through shared equipment, the reuse of disposa- 
ble egg-trays, the movement of vehicles (during chick 
delivery, the delivery of feed, and the collection of dead- 
birds), the management practices of integrated companies, 
contaminated bird-carrying crates during slaughterhouse- 
related farm visits and through the clothing, shoes and 
hands of farm visitors [18-27]. Such mechanisms of trans- 
mission via fomites are also identified as sources of possi- 
ble risk through catching company personnel and vehicles 
associated with slaughterhouse-related farm visits [28]. 

Whilst this evidence is largely circumstantial, arising 
from epidemiological investigations, it is considered likely 
that AI will share the same mechanisms for between- 
farm transmission as other pathogens similarly trans- 
mitted via the faecal-oral route [29], such as Salmonella, 
Campylobacter and those associated with coccidiosis 
[16]. Fomites have been implicated in poultry flock 



infections caused by these pathogens and represent possi- 
ble mechanisms of between-farm transmission; for exam- 
ple, during slaughterhouse-related farm visits via 
equipment such as bird-carrying crates and pallets, the 
wheels of forklift trucks and slaughterhouse vehicles, the 
boots of drivers' and catchers', as well as via staff and 
equipment shared between different farm premises 
[20,30-34]. Evidence from previous outbreaks also sug- 
gests that spatial spread, possibly via airborne mechan- 
isms, may also play an important role between farms 
within close proximity [18,20,25,35,36]. However, this 
mechanism is considered to be relatively less important 
for GB compared with countries such as the Netherlands 
[35], which has regions of greater poultry farm density. 

As a result of the targeted sampling of known SHs and 
CCs, missing data inherently biases the PND towards 
large poultry premises. Therefore the PND cannot be 
considered representative of the entire GB poultry indus- 
try and was never intended to be so [Lucy Snow, pers. 
comm.]. It has been shown that even when individuals 
are sampled at random, this process may not result in a 
random representation of their contacts, and conse- 
quently overall network properties [1,2,37]. Missing data 
within the PND are inherently non-random, and there- 
fore systematic differences in the types of farms sampled 
compared to those unsampled may further exacerbate 
the misrepresentation of network properties, and the 
identification of high risk sectors of the poultry industry. 
The validity of generalising PND informed network prop- 
erties to a national-scale is potentially reduced by missing 
farms. Therefore, establishing the likely characteristics of 
these missing farms, based on the known properties of 
those that are well-characterised, is an important step to 
inform future data collection exercises. It is only through 
a more representative characterisation of the poultry 
industry that contact heterogeneities can be usefully 
applied to predictive models of poultry disease control. 

To our knowledge, the appropriateness of using 
inferred industry contacts from the PND for informing 
predictive AI models in GB has not been considered in 
the published literature. In particular, the potential impli- 
cations of targeted sampling procedures for predictive 
modelling of AI control have yet to be quantified. Poten- 
tial biases in inferred poultry network properties may 
have important consequences for government prepared- 
ness of resource distribution during an outbreak; the 
extent of between-farm spread may depend on how rapid 
and where the movement restrictions that inhibit this 
risk are implemented. As the human health, animal wel- 
fare and economic consequences of a large AI outbreak 
could potentially be catastrophic [38-44], government 
and industry preparedness for such an event is vital. 

Our aim was to identify geographical areas with biases 
in the farm contact structure by extrapolating network 
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data informed by the PND to the GBPR, which is more 
demographically representative of GB poultry farms but 
without the detailed information on between-farm asso- 
ciations via the poultry industry. This database was 
established by the British Department for Environment, 
Food and Rural Affairs (Defra) in December 2005, and 
it is mandatory for all commercial farms holding more 
than 50 birds to record their farm-related details [45]. 

Specifically, our objectives were to: (i) determine sta- 
tistical associations between farm-level factors and net- 
work informed between-farm association frequency, 
using multivariable logistic regression; (ii) extrapolate 
the fitted statistical models to each farm recorded in the 
GBPR, obtaining predicted probabilities for categorical 
between-farm association frequency; (iii) compare the 
regional-level (GB divided into eleven geographical 
regions) distribution of PND-informed between-farm 
association frequencies with estimates following extrapo- 
lation to the GBPR. 

Results 

The poultry industry network 

The PND, with between-farm associations assumed to 
arise through shared industry contacts, was highly con- 
nected: most farms were potentially associated with almost 
all other farms, mostly through slaughterhouses (SHs) and 
catching companies (CCs) (Figures 1). This is consistent 
with previous work using the PND which reports that, 
when all types of industry contacts are combined, the 
giant component of the network (i.e. the largest group of 
connected farms) includes the majority of premises [16]. 
The largest SH is important for connecting smaller clus- 
ters of farms that are themselves connected to each other 
through SHs (Figure lb). 

Assessing the introduction of bias following data 
reduction 

The univariable odds ratios (ORs), computed both before 
and after the exclusion of farm records with missing pre- 
dictor variable data (see Methods section), did not sug- 
gest that any significant biases would be introduced to 
either the scenario 1 or 2 analyses (Tables 1 and 2 
respectively). Therefore the reduced dataset was used for 
the multivariable statistical modelling. 

Scenario 1: predictors of large between-farm association 
frequency 

Equation 1 shows the form of the logistic model used to 
identify predictors of a large between-farm association 
frequency (L af ; referred to as scenario 1, see Methods for 
further details). The logit function represents a nonlinear 
transformation of the probability that farm i has a L a f, Pr 




Figure 1 Components of the British poultry industry network. 

Full contact network between poultry farms, slaughterhouses (SHs), 
catching companies (CCs) and integrated companies (ICs) (a), and 
network components partitioned into associations between farms 
and SHs (b), farms and CCs (c) and between farms within ICs (d), 
using farms for which complete contact information was known (n 
= 662). Orange = farm, red = SH, black = largest SH, green = CC, 
blue = IC. 
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Table 1 Crude odds ratios before and after removal of records with missing data: scenario 1 analyses 





Full data (n = 662 farms) 


Following the removal of records (n 


= 348 farms) 




Farm-level predictors 


OR a 


s.e. OR b 


p-value 


OR a 


s.e. OR b 


p-value 


% change 


L vs. S bird count 6 


U.zo4 


1 icn 


<U.UUU I 


n 1 oc 
U. I bo 


1 .30/ 


<0.0001 


34.9 C 


L vs. S house count 6 


0.354 


1.306 


<0.0001 


0.299 


1.327 


<0.0001 


15.5 


Indoors 


0.298 


1.289 


<0.0001 


0.301 


1.307 


<0.0001 


1.0 


Free-range 


5.266 


1.298 


<0.0001 


5.010 


1.317 


<0.0001 


4.9 


Housing other 


0.829 


1.510 


0.650 


0.598 


1.590 


0.268 


27.9 C 


Partial housing 


1.009 


1.941 


0.990 


0.763 


2.209 


0.733 


24.4 


Integrated 


0.271 


1.222 


<0.0001 


0.258 


1.309 


<0.0001 


4.8 


East 


1.905 


1.298 


0.014 


2.545 


1.392 


0.005 


25.1 


Scotland 


0.333 


2.144 


0.150 


0.308 


2.899 


0.268 


7.5 


Wales 


0.119 


2.087 


0.004 


0.192 


2.848 


0.115 


38.0 d 


West 


0.418 


1.287 


0.001 


0.401 


1.429 


0.010 


4.1 



a OR = odds ratio; b s.e. = standard error of the odds ratio; c >25% change in odds ratio but direction of association and significance is comparable; d single 
variable for which there is >25% change in odds ratio and no change in direction of association, but significance is altered; e L = large, S = small. 



(L a f,i)> /V s the average log-odds of a L af for farms within 
the baseline predictor variable categories, p if f} 2 ... Pn are 
average log-ORs for each predictor variable (see Tables 3 
and 4 for definitions of the linear predictors), fi 12 P13 and 
P 14 are the log-ORs for farms in the baseline categories 
for interacting variables. 

logit Pr(L af/ 0 = p 0 + fchbLSi + fohbSLi 

+p 3 hbLLi + faifNNi + fcifNYi + fcifYYi 

+ f3 7 Integratedi + f3 8 Easti + foScotlandi (1) 

+PioWaleSi + PnWestf + P\ 2 ifNNi * Integratedi 

+/3i3ifNYj * Integratedi + fiuifYYj * Integratedi 

Management type and poultry house count were 
found to be significantly associated with between-farm 
association frequency (Table 5); farms keeping only 
free-range birds were more likely (OR = 12.19, 95% CI 
= 3.82-38.91, p < 0.001), and farms with a large poultry 



house count were less likely (OR = 0.16, 95% CI = 
0.04-0.64, p = 0.009 and OR = 0.32, 95% CI = 0.14- 
0.71, p = 0.005, for farms with small and large bird 
counts respectively) to be assigned L af . There was also 
evidence of association with geographical location; 
farms located in the West of England were less likely 
than farms located in the North of England to be 
assigned L af (OR = 0.32, 95% CI = 0.14-0.76, p = 0.01). 
The effect of management type was found to differ 
depending on the integration status of the farm; free- 
range integrated farms were significantly less likely 
than free-range non-integrated farms to be assigned L a f 
and vice versa (interaction coefficient = 0.13, 95% CI = 
0.03-0.59, p = 0.009). There was no evidence of a poor 
fit to the data based on an assessment of the model 
residuals or model predictive ability (area under the 
ROC curve for varying model sensitivity and specificity 
= 0.86). 



Table 2 Crude odds ratios before and after removal of records with missing data: scenario 2 analyses 





Full data (n = 662 farms) 


Following the removal of records (n 


= 348 farms) 




Farm-level predictors 


OR a 


s.e. OR b 


p-value 


OR a 


s.e. OR b 


p-value 


% change 


L vs. S bird count d 


7.304 


1.485 


<0.0001 


5.588 


1.369 


<0.0001 


23.5 


L vs. S house count d 


1.783 


1.289 


0.023 


1.879 


1.310 


0.019 


5.1 


Indoors 


2.888 


1.425 


0.003 


3.822 


1.472 


0.001 


24.4 


Free-range 


0.280 


1.529 


0.003 


0.227 


1.579 


0.001 


18.9 


Housing other 


0.363 


1.648 


0.043 


0.338 


1.656 


0.031 


6.9 


Partial housing 


0.890 


1.991 


0.866 


0.592 


2.254 


0.518 


33.5 C 


Integrated 


0.544 


1.232 


0.003 


0.554 


1.319 


0.033 


1.8 


East 


0.446 


1.401 


0.017 


0.426 


1.538 


0.047 


4.5 


Scotland 


0.061 


2.837 


0.007 


0.090 


2.914 


0.024 


32.2 C 


Wales 


0.090 


1.722 


<0.0001 


0.116 


2.176 


0.006 


22.4 


West 


0.047 


1.477 


<0.0001 


0.056 


1.583 


<0.0001 


16.1 



a OR = odds ratio; b s.e. = standard error of the odds ratio; c >25% change in odds ratio but direction of association and significance is comparable; d L = large, S = 
small. 
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Table 3 Farm-level predictors used in statistical analyses of associations with between-farm association frequency 



Predictor 


Data type 


Description 


% missing 


variable 






(n = 662) 


Species 


Categorical 


Production type or poultry species 


54.38 


Bird count a 


Binary (large vs. small) 


Total number of birds on site 


41.39 




dm idiy \jdiyc vj. oiiidii^ 


Tnt^l ni imhpr nf noi iltrv hni kpc nn Qitp 

1 Uldl IIUIIIUtM Ul |JUUILIy 1 lUUbCJ UN oIlC 


40 03 


count 3 








Indoors 


Binary (yes vs. no) 


Categorisation of whether the premises houses any of its birds indoors (e.g. barn, 


38.97 






cage or pole barn) 




Outdoors b 


Binary (yes vs. no) 


Categorisation of whether the premises houses any of its birds outside 


38.97 


Free-range 

b 


Binary (yes vs. no) 


Categorisation of whether the premises has registered any free range birds 


38.97 


Housing 


Binary (yes vs. no) 


Categorisation of whether the premises keeps any of its birds in other housing 


38.97 


other b 








Partial 


Binary (yes vs. no) 


Categorisation of whether the premises keeps any of its birds in partial housing (e.g. 


38.97 


housing b 




coop, brooder house, shelter pen or grass run) 




Integrated 


Binary (yes vs. no) 


Whether premises is part of an integrated company or associated with a company 


6.95 


Region 


Categorical (East c , Scotland, 


Regional location of premises within GB based on the county in the premises 


8.61 




Wales, West c vs. North c ) 


address 





following categorisation of original numeric variables 

b Original categorisation for variables indicating the farm management type 

Geographical regions of England 



Scenario 2: predictors of medium between-farm 
association frequency 

Equation 2 shows the form of the logistic model used to 
identify predictors of a medium between-farm associa- 
tion frequency (M af ; referred to as scenario 2, see Meth- 
ods for further details). The logit function represents a 
nonlinear transformation of the probability that farm i 
has a M af , Pr(M af>/ ), /3 0 is the average log-odds of a M a f 
for farms within the baseline predictor variable cate- 
gories and Pi, p 2 ... Ps are average log-ORs for each pre- 
dictor variable (see Tables 3 and 4 in the methods for 
definitions of the linear predictors). 

logit Pr(M af/I ) = p 0 + PihbLSi + fohbSU 

+p 3 hbLLi + p^lntegratedi + fisEasti (2) 

+p 6 Scotlandi + p 7 Walesi + p^Westi 

In contrast to scenario 1 analyses, bird count rather than 
poultry house count was a significant predictor of 
between-farm association frequency (Table 6). Farms with 



a large bird count were significantly more likely to be 
assigned M af (OR = 6.89, 95% CI = 2.18-21.76, p = 0.001 
and OR = 6.22, 95% CI = 2.25-17.25, p < 0.001, for farms 
with small and large poultry house counts respectively). 
Similarly to scenario 1 analyses, integrated companies 
were significantly less likely than non-integrated compa- 
nies to be assigned M af (OR = 0.44, 95% CI = 0.21-0.92, p 
= 0.03). Geographic location was also found to be impor- 
tant; farms located in Scotland, Wales and the West of 
England were significantly less likely than farms located in 
the North of England to be assigned M af (ORs = 0.045 to 
0.073, p < 0.005). There was no evidence of a poor fit to 
the data based on an assessment of the model residuals or 
model predictive ability (area under the ROC curve for 
varying model sensitivity and specificity = 0.83). 

Comparative analysis of geographical variation 

Comparing the PND with the GBPR, the geographical 
distribution of sampling coverage and capacity was 



Table 4 Definitions of farm-level predictors grouped into their cross classifications as used in statistical analyses 



Original variables* Description of cross-classification Predictor variable ID Farm frequency 



House count/ 


house count = small, bird count = small 


hbSS 


128 


Bird count 










house count = small, bird count = large 


hbSL 


53 




house count = large, bird count = small 


hbLS 


50 




house count = large, bird count = large 


hbLL 


117 


Indoor/ 


indoor = no, free-range = no 


ifNN 


26 


Free-range 










indoor = no, free-range = yes 


ifNY 


79 




indoor = yes, free-range = no 


ifYN 


229 




Indoor = yes, free-range = yes 


ifYY 


14 



*see Table 3 
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Table 5 Results from multivariable logistic regression: scenario 1 analyses (n = 348 farms) 



Farm-level predictor 


Predictor levels 3 


0R b 


lower 95% Cl c 


upper 95% Cl c 


p-value 


Intercept 01 


- 


0.608 


0.264 


1.401 


0.243 


House count/Bird count 


hbSS e 


1 


- 


- 


- 




hbLS 


0.161 


0.041 


0.636 


0.009 




hbSL 


0.507 


0.207 


1.246 


0.139 




hbLL 


0.317 


0.141 


0.711 


0.005 


Indoor/Free-range 


ifYN e 


1 










ifNN 


1.810 


0.275 


1 1 .927 


0.537 




ifNY 


12.185 


3.815 


38.913 


<0.001 




ifYY 


3.072 


0.725 


13.018 


0.128 


Integration status 


Non-integrated 6 


1 










Integrated 


0.681 


0.307 


1.512 


0.345 


Geographical location 


Region: North 6 


1 










Region: East 


1.337 


0.567 


3.152 


0.507 




Region: Scotland 


0.324 


0.038 


2.752 


0.302 




Region: Wales 


0.126 


0.012 


1.329 


0.085 




Region: West 


0.321 


0.135 


0.761 


0.010 


Interaction terms 


ifNN*lnteg rated 


0.900 f 


0.075 


10.778 


0.934 




ifNY* Integrated 


0.1 28 f 


0.028 


0.594 


0.009 




ifYY* Integrated 


0.965 f 


0.048 


19.526 


0.982 



a See tables 3 and 4 for definitions; b OR = odds ratio; C CI = confidence interval of the OR; d average odds in the baseline predictor groups; e reference-level 
category; f OR for farms in the baseline category of the other interacting variable; results significant at 5% error level are indicated in bold. 



noticeably different (Figures 2a and 2b). It is possible 
that this misrepresentation of farms within the PND has 
lead to systematic error (or bias) in the inherent 
description of the network. Indeed, following the extra- 
polation of between-farm association frequency to the 
GBPR, substantial differences were found when com- 
pared to the observations from the PND. Comparing 
both datasets, the probabilities obtained were signifi- 
cantly different for all regions (Figures 3a and 3b); the 
values inferred from the PND do not overlap the 95% 
confidence intervals (CIs) generated for the estimates 



obtained using the GBPR data (see Methods section for 
further details on the simulations used to generate these 
CIs). 

Comparing the regions within Great Britain, geogra- 
phical variation in the predicted probabilities extrapo- 
lated to the GBPR data was observed; neighbouring 
regions were found to be typically more similar to each 
other. For example, three regional clusters were 
observed: (i) the North West, North East, Yorkshire, 
East Midlands and Eastern regions of England, (ii) 
Greater London and the South East of England, and (iii) 



Table 6 Results from multivariable logistic regression: scenario 2 analyses (n 


= 270 farms) 




Farm-level predictor 


Predictor levels 3 


OR b 


lower 95% Cl c 


upper 95% Cl c 


p-value 


Intercept 01 




0.603 d 


0.234 


1.553 


0.294 


House count/Bird count 


hbSS e 


1 










hbLS 


0.148 


0.016 


1.352 


0.091 




hbSL 


6.891 


2.182 


21.762 


0.001 




hbLL 


6.224 


2.246 


17.247 


<0.001 


Integration status 


Non-integrated 6 


1 










Integrated 


0.442 


0.212 


0.922 


0.030 


Geographical location 


Region: North 6 


1 










Region: East 


0.510 


0.200 


1.301 


0.159 




Region: Scotland 


0.045 


0.005 


0.396 


0.005 




Region: Wales 


0.073 


0.015 


0.362 


0.001 




Region: West 


0.050 


0.019 


0.130 


<0.001 



a See tables 3 and 4 for definitions; b OR = odds ratio; C CI = confidence interval of the OR; d average odds in the baseline predictor groups; e reference-level 
category; results significant at 5% error level are indicated in bold. 
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| Small 
| Medium 
j Unclassified 





Figure 2 Extrapolating between-farm association frequency from the Poultry Network Database to the Great Britain Poultry Register. 

County-level average probabilities of small, medium and large between-farm association frequencies, as observed in the Poultry Network 
Database (n = 662) (a), and as predicted following extrapolation to the Great Britain Poultry Register (GBPR) (n = 3009) using fitted statistical 
models (farms known to be associated with the large slaughterhouse represent only -3% of GBPR farms and therefore cannot be seen from this 
figure) (b). Pie sizes are proportional to the county-level number of farms for the respective datasets. 



the West Midlands and South West of England (Figure 
3c). Scotland and Wales on the other hand appear dis- 
tinct; their large between-farm association frequency 
propensity is different to the other regions (i.e. the 95% 
CIs do not overlap the other regions), whilst they appear 
more similar in terms of their medium between-farm 
association frequency probabilities (Figures 3a and 3b). 
Furthermore, the width of the CIs generated using the 
GBPR demonstrates our confidence in these estimates 
and whether their likely range is comparable between 
regions. Prioritising regions based on the rank order of 
our confidence in the estimated probabilities (i.e. more 
confidence can be ascribed to a narrower CI) reveals dif- 
ferences across the between-farm association frequency 
categories (Table 7). 

Discussion 

Geographical bias in network data 

The targeted sampling strategies employed in the col- 
lation of network data for epidemiological use may be 



inherently biased in terms of demographic representa- 
tion. Our results demonstrate how such demographic 
information may also result in a biased representation of 
the network properties. Using an example of the British 
poultry industry network comprised of farms, slaughter- 
houses (SHs), catching companies (CCs) and integrated 
companies (ICs), we show how risk-based collation of 
the PND has potentially led to misrepresentation of 
between-farm connectivity. These findings also have 
importance for other poultry diseases also transmitted 
via fomites, such as Salmonella, Campylobacter and 
those associated with coccidiosis [31-33,46,47]. Our 
results have particular implications for highly pathogenic 
AI (HPAI) in GB, as predictive and exploratory models 
have been informed by the network structure provided 
by the PND [14-16]. 

Although the PND was considered a priori to be 
inherently biased in terms of its representation of farm 
characteristics, bias in the network characteristics had 
not previously been explored. Our results show how the 
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Figure 3 Predicted regional-level between-farm association frequency extrapolated to farms recorded in the Great Britain Poultry 
Register. Regional average probabilities of (a) large versus medium and (b) large versus small between-farm association frequencies (blue 
circles), following extrapolation of network information to the Great Britain Poultry Register (n = 3009 farms). Error bars represent 95% 
confidence intervals generated from 1000 stochastic simulations of randomly assigning each farm to a small, medium or large between-farm 
association frequency group. Black triangles represent proportions of farms within these categories observed from the Poultry Network Database 
(n = 662 farms), (c) Geographical clustering of the regional predicted probabilities represented by their corresponding colours (note: Scotland 
and Wales were considered distinct from the other regions). W = Wales; S = Scotland; L = Greater London; WM = West Midlands, SW = South 
West, EM = East Midlands, NE = North East, E = East, Y = Yorkshire, NW = North West and SE = South East of England. 



geographical distribution of between-farm association 
frequency, as inferred from the PND, significantly dif- 
fered following extrapolation of this network data to the 
GBPR (Figures 3a and 3b). The purpose of this extrapo- 
lation process was not to accurately predict farm-level 
connectivity for farms recorded in the GBPR, and 
assumes the statistical association between the farm- 
level predictors and between-farm association frequency 
is true. Extrapolating this network information was a 
method by which to test the PND network associations 
making use of the more representative distribution of 
farm-level factors provided by the GBPR. 



Our analyses have demonstrated heterogeneities in the 
demographic profile between the datasets, highlighting 
types of farms and regions of GB where network data 
should be expanded. The confidence intervals for prob- 
abilities of between-farm association frequencies, esti- 
mated for the GBPR data, reflect the accuracy of these 
estimates (Figures 3a and 3b). We recommend further 
sampling should be carried out within regions where we 
have relatively poor confidence in our estimates, in par- 
ticular prioritising regions for which we have the smal- 
lest confidence in large between-farm association 
frequency probabilities (i.e. first column of Table 7). 
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Table 7 British regions ranked by confidence interval widths for estimated probabilities of between-farm association 



frequencies 


Regions ranked by L a 


CI range for L b 


Regions ranked by M a 


CI range for M b 


Regions ranked by S a 


CI range for S b 


Greater London 


0.438 


North East 


0.132 


East 


0.065 


1 NUI LI 1 Edol 


0 208 


Wales 


0 056 


^ni ith \A/pc;f 

JUU LI 1 V veil 


0 069 


North West 


0.106 


North West 


0.048 


West Midlands 


0.074 


Wales 


0.103 


East Midlands 


0.044 


Scotland 


0.075 


South East 


0.092 


Yorkshire 


0.043 


East Midlands 


0.076 


Yorkshire 


0.089 


South East 


0.032 


South East 


0.092 


West Midlands 


0.074 


East 


0.027 


Yorkshire 


0.096 


Scotland 


0.072 


West Midlands 


0.023 


North West 


0.117 


South West 


0.069 


Scotland 


0.016 


Wales 


0.119 


East Midlands 


0.068 


South West 


0.015 


North East 


0.226 


East 


0.060 


Greater London 


<0.001 


Greater London 


0.375 



a Regions ranked in order of priority based on confidence in predicted probabilities of large (L) or medium (M) between-farm association frequency (CI range 
ranked from highest to lowest), and small (S) between-farm association frequency (CI range ranked from lowest to highest). 

h'he 95% confidence interval range (upper bound - lower bound) for predicted probabilities of large (L), medium (M), and small (S), between-farm association frequencies. 



Methodological considerations 

Using multivariable logistic regression we have identified 
statistically significant (p < 0.01) associations between 
farm-level factors and between-farm association frequency 
using the PND. We found that small (based on both the 
number of poultry houses and total bird count), non-inte- 
grated, free-range farms were more likely to have a large 
between-farm association frequency. Although our aim 
here was not to directly determine the impact of network 
biases on disease transmission predictions, drawing valid 
conclusions from analyses of contact heterogeneity 
requires consideration of systematic errors in sampled net- 
work data. The analyses here did not directly allow for 
such inference as between-farm association frequencies do 
not necessarily correlate with AI exposure frequencies. For 
example, although we found that free-range farms may 
have a greater overall between-farm association frequency, 
we would expect them to have fewer farm visits on a daily 
basis due to their typically longer production cycles and 
smaller bird throughput. 

Nevertheless, the chance of a farm becoming exposed to 
AI virus during a slaughterhouse-related farm visit will 
depend in part on the number of farms visited by a single 
SH vehicle and catching team within a single day. We 
believe that it can be reasonably hypothesised that pre- 
mises associated with larger SHs (i.e. with a greater num- 
ber of associated farms), such as the free-range farms in 
our analyses, may have a greater risk of infection from 
other associated farms. This is because of the likely greater 
number of farm clients visited in one day by the vehicles 
of these larger SHs (up to a threshold level of a feasible 
number of daily farm visits) [Jennifer Dent, pers. comm.]. 
In the case of CC movements, an analysis of temporally 
explicit catching-related movement data suggests they 
may be relatively less important than SH vehicles for AI 
transmission, as only one farm was visited by a catching 



team within a single-day for 84% of the recorded farm vis- 
its; however, up to seven visits within a day was possible 
[48], and this result could be limited by the representation 
of only one CC. 

One source of missing data within the PND results from 
non-reporting of information by at least one farmer across 
all data fields (Table 3). Although methods for imputing 
such missing values for the purpose of statistical regres- 
sion analyses exist [49-51], such measures would likely 
add to the uncertainty in our extrapolated outputs and so 
were considered inappropriate for the purpose of the ana- 
lysis here. In any case, it was determined unlikely that 
such non-reporting resulted in systematic errors in the 
estimated model coefficients, as no significant differences 
were identified from a comparison of univariate ORs cal- 
culated before and after the removal of records with miss- 
ing data (Tables 1 and 2). 

Existing analyses have used the PND without considera- 
tion to data biases. Truscott et al. (2007) used the PND to 
derive a negative binomial distribution for the number of 
contacts. Similarly, Sharkey et al. (2008) used the PND to 
inform the geographical profile and frequency of farm 
movements, and Dent et al (2008) used the PND to infer 
farm associations through shared industry contacts (as in 
the analyses here). These studies have thereby potentially 
misrepresented the extent of network connectivity through 
the under-representation of smaller farms. Through better 
characterisation of these misrepresented sectors of the 
poultry industry, the use of poultry network data for 
informing predictive models of AI control can be more reli- 
ably assessed. 

Epidemiological implications 

Our results suggest that free-range farms may have 
more extensive implications for AI control measures 
than previously anticipated. Free-range farms could be 



Nickbakhsh et al. BMC Veterinary Research 201 1, 7:66 
http://www.biomedcentral.eom/1 746-61 48/7/66 



Page 10 of 16 



targeted both to minimise the risk of introduction 
through contact with wild birds, such as through tar- 
geted surveillance [52], and - via improved biosecurity 
measures - to minimise the risk of onward spread 
through SH vehicle movements. Furthermore, free-range 
farms may have comparatively different logistical consid- 
erations in terms of the extent of contact tracing due to 
their potential wide range of associations. These impli- 
cations for disease control measures, to minimise 
between-farm spread via fomites during farm visits, are 
applicable to the period prior to the detection and noti- 
fication of an outbreak to the authorities [17]. Once 
notification has occurred, the risk of between-farm 
spread will be limited to how rapid and where control 
measures are implemented, as well as to poultry farm 
density if airborne mechanisms of spread are important 
[35]. Whether the observed demographic bias in net- 
work connectivity does indeed correspond to infection 
risk will be the focus of future work incorporating tem- 
porally explicit CC movement data. 

Using the PND to inform predictive models of AI 
control may also lead to a misrepresentation of maxi- 
mum between-farm association frequency at a national- 
scale. The different implications for regional-level dis- 
ease control between the datasets highlights the poten- 
tial difficulties of relying upon data subsets to infer 
disease control at this scale. When comparing sampling 
coverage (the geographical distribution) and capacity 
(the proportion of the population captured) between the 
datasets alone, Scotland, the East and the South East of 
England appear particularly under-sampled by the PND 
(Figures 2a and 2b). However, significant under-estima- 
tion of large between-farm association frequency was 
found, when informed by the PND compared with the 
GBPR, for all regions except the South East and the 
North West of England (Figure 3). This suggests that 
the under-sampling of the PND is not alone predictive 
of bias in this network data. 

We recommend that future data collection should tar- 
get those farms where additional sampling could 
improve our confidence in estimated between-farm 
association frequencies. By ranking regions based on our 
confidence in these estimates we demonstrate how data 
collection can be prioritised, in particular in those 
regions where we have relatively low confidence in large 
between-farm association frequencies, such as Greater 
London and the North East of England (Table 7). We 
also highlight the apparent difference in large between- 
farm association frequency for Scotland and Wales, 
which appear distinct from the other regions despite 
their relatively narrow confidence intervals (Figure 3). 
Such differences between regions may be useful for 
informing targeted disease control strategies. 



Future data collection should also be directed towards 
the subset of farms within the GBPR which were unclas- 
sified in terms of their probability of a large between- 
farm association frequency (see 'Extrapolating network 
data to the GBPR' in the Methods section). The farm- 
level predictors of large between-farm association fre- 
quency may only reflect the characteristics of farms con- 
nected to the large SH in the PND; it may not be 
appropriate to generalise and assume that farms with 
similar characteristics will also be associated with other 
large SHs. As the PND was deliberately targeted at lar- 
ger poultry industry premises, the very large SH in the 
PND may represent the only one in GB of this size; 
however, the sampling procedure captured only 47.5% 
(57/120) of SHs approved by the British Food Standards 
Agency at the time these data were collated [Lucy 
Snow, pers. comm.]. Therefore, a better understanding 
of the activities of unsampled SHs is also important. 

Conclusions 

We have shown how systematic errors in the demo- 
graphic characterisation of network data, resulting from 
targeted sampling procedures, can bias the picture of 
between-host network connectivity. Detailed analyses of 
potential network bias within the PND are an important 
step towards obtaining a more accurate characterisation 
of the British poultry industry network structure. Pro- 
viding a means of using this network information in a 
more representative way can help us more reliably infer 
the role of contact heterogeneities in the spread of poul- 
try diseases. Based on the distribution of demographic 
factors represented by the GBPR, we have demonstrated 
that between-farm connectivity inferred from the PND 
may be biased. The sampling coverage and capacity is 
not alone indicative of this network bias; estimates of 
between-farm association frequency differed significantly 
across all regions of GB following extrapolation to the 
GBPR. We recommend that regions where we have rela- 
tively low confidence in our estimates of large between- 
farm association probability should be prioritised for 
future poultry network data collection. A subset of 
farms unsampled by the PND, and unclassified in terms 
of their large between-farm association frequency prob- 
ability, were identified and we suggest these are also tar- 
geted in future data collection exercises. Evaluating 
whether and how the between-farm association frequen- 
cies impact on the risk of between-farm transmission 
will be the focus of future work. 

Methods 

Inferring between-farm association frequency 

The PND consisted of surveys administered to: (i) single- 
site and (ii) multi-site farm premises, (iii) slaughterhouses 
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(SHs) and (iv) catching companies (CCs), as informed by 
a NEEG (National Epidemiology Emergency Group) and 
CERA (Centre for Epidemiology and Risk Analysis) data 
collection exercise for Defra [53]. Catching companies 
comprise teams of personnel who are responsible for 
catching birds and loading them into vehicles for trans- 
portation to the SH. These companies may be indepen- 
dent and contracted by a SH, or employed by SHs or 
CCs who provide their own catching teams [28]. In total, 
these surveys provided information on 4,067 farms pre- 
mises, 96 SHs and 102 CCs. These data were used to 
construct a between-farm association matrix, based on 
the assumption that farms that share the same SH, CC or 
through an integrated company (IC) were potentially epi- 
demiologically linked, and therefore potential sources of 
AI virus exposure to each other [16]. 

SHs and CCs were considered to be independent 
industry layers, as CC teams and SH vehicles follow 
independent schedules, and so were considered to have 
different potential mechanisms of spreading AI between 
farms. For example, farms that share the same SH may 
share AI exposure indirectly through fomites via SH 
vehicles, should they visit multiple farms without disin- 
fecting wheels or the bird carrying crates [32,54]. Farms 
that share the same CC may also share AI exposure risk 
through fomite transmission, but in this case via the 
wheels of vehicles transporting catching team personnel 
between-farms, forklift trucks, or through contamination 
of personnel clothing and equipment [19,33], and espe- 
cially if they visit multiple farms within a single day 
[28]. The main risk to biosecurity results from the 
catchers footwear, clothing and masks/gloves if these are 
re-used on different poultry premises without sufficient 
disinfection [28]. A further potential contact mechanism 
was explored based on between-farm associations 
through ICs, to represent the risks associated with the 
movement of personnel and shared equipment by these 
farms [20,22]. No data were available for other potential 
mechanisms of transmission, such as through feed deliv- 
ery [54,55], egg collection [26] or artificial insemination 
visits [56], and therefore are not represented here. 

Quantifying between-farm association frequency 

A subset of farms captured by either the SH or CC sur- 
veys (n = 3308), and therefore for which only partial 
industry contact information was known, were used to 
inform the between-farm association matrix. This was 
considered appropriate as these farms contribute to the 
association-frequency of other farms captured by both 
surveys that were used in the statistical analyses (see 
Figure 4). 

Summing the rows (or columns) of the between-farm 
association matrix gave the total farm-level between- 
farm association frequency. For example, if farm i was 



associated with farm ; through either sharing the same 
SH, CC or through being part of an IC, this was repre- 
sented by 1 in the matrix, or 0 if they were not asso- 
ciated. These industry layers, although considered 
independent, were combined in the calculation of 
between-farm association frequency due to lack of 
knowledge regarding their relative impact on disease 
transmission potential. Although the strength of contact 
may vary between these industry layers, their combina- 
tion provides insight into the range of total associations 
a farm may have. This has importance for considering 
the logistics of contact tracing for example, particularly 
under outbreak circumstances where the importance of 
different types of contact are not known. No temporally 
explicit information was available for the inferred 
between-farm associations, and we note that they may 
be considered representative of a maximum frequency, 
since not all associations will be active over any given 
time period. 

Statistical analyses 

Response variable: between-farm association frequency 
distribution 

All farms with a recorded between-farm association fre- 
quency >1079 were associated with one particularly 
large SH, resulting in a bimodal frequency distribution 
(Figure 5). This large SH (black circle, Figure 1) was 
located in the North of England, but serviced premises 
throughout GB that represent a range of chicken pro- 
duction types; the majority of their clients were layers (n 
= 129, 75%), a smaller proportion were broiler breeders 
(n = 39, 23%) and a small number were broilers (n = 4, 
2%), based on data for farms captured by both SH and 
CC surveys. The between-farm association frequency 
distribution aggregated farms into two groups; those 
categorised as 'V were clearly separate (see Figure 5). 
This non-standard distribution led to the dichotomisa- 
tion of the response variable and therefore logistic 
regression was used. 

With the objective of characterising types of PND 
farms according to their between-farm association fre- 
quency, it was considered appropriate to group farms 
that did not form part of the large SH cluster into two 
further groups (categorised as small (S) and medium 
(M), see Figure 5). As there was no epidemiological or 
practical interpretation of the between-farm association 
frequency, the choice of cut-off for this dichotomisation 
of the data was chosen at approximately the mid-point. 
Whilst this choice was arbitrary, based on an explora- 
tory rationale, it enabled a more direct comparison with 
scenario 1 analyses than would have been permitted by 
fitting a more complex continuous distribution. Logistic 
regression was therefore also used for scenario 2 
analyses. 



Nickbakhsh et al. BMC Veterinary Research 201 1, 7:66 
http://www.biomedcentral.eom/1 746-61 48/7/66 



Page 12 of 16 



Dataset 1, n=4067 farms: Full dataset of farms 
amalgamated from single-site, multi-site, SH and 
CC surveys. 



Data reduction 1 
i f 

Dataset 2, n=3308 farms: These farms were 
captured by either the SH or CC surveys and were 
used to infer between-farm association 
frequency. 



Data reduction 2 

Dataset 3, n=662 farms: These farms were 
captured by both SH and CC surveys. Their PND 
informed between-farm association frequencies 
were used in a geographical comparison following 
extrapolation of this network information to the 
GBPR. 



Data reduction 3 

Dataset 4, n=348 farms: These farms have 
complete data across all farm-level predictor 
variables and therefore were used in the statistical 
modelling analyses. 

Figure 4 Reducing the Poultry Network Database into data subsets. SH = slaughterhouse; CC = catching company; PND = Poultry Network 
Database; GBPR = Great Britain Poultry Register. 



As farms with complete industry contact information 
were required to determine statistical associations 
between the farm-level predictors and between-farm 
association frequency, all farms for which full contact 
information was not known (i.e. captured by only either 
SH or CC surveys) were excluded for the purpose of the 
statistical analyses. This resulted in a reduction in the 
dataset from 3308 to 662 farm records. 

In summary, three between-farm association frequency 
groups were formed: (i) small (S af; 1-299 associations, n = 
374 farms) (ii) medium (M a f ; 301-879 associations, n = 
141 farms) and (iii) large (L af; 1079-1623 associations, n = 
147 farms). Based on these categories, two statistical 



scenarios were formed with different response variables: 
(i) L af versus S a f/M af and (ii) M af versus S a f, referred to as 
scenarios 1 and 2 respectively (Figure 5). The prevalence 
of L af and M af were 22% and 27%, for scenarios 1 and 2 
respectively. 

Farm-level predictor variables 

A subset of farms (n = 348) with no missing data for the 
demographic predictor variables were used for the statisti- 
cal analyses (Figure 4). Following this data reduction, the 
distribution of farms across the between-farm association 
categories were as follows: (i) small (S af; 3-294 associations, 
n = 183 farms) (ii) medium (M af . 301-674 associations, 
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Figure 5 Distribution of between-farm association frequency and analysis scenarios. A comparison between large (1079-1623 associations, 
n = 147 farms) and small/medium (1-879 associations, n = 515 farms) between-farm association frequencies formed scenario 1 analyses, and a 
comparison between medium (301-897 associations, n = 141 farms) and small (1-299 associations, n = 374 farms) between-farm association 
frequencies formed scenario 2 analyses. Note: this figure refers to the analysis prior to the removal of records with missing data (i.e. n = 662 
farms) and was not qualitatively different following this data reduction. 



n = 87 farms) and (iii) large (L af; 1079-1623 associations, n 
= 78 farms). The prevalence of L a f and M a f were 22% and 
32%, for scenarios 1 and 2 respectively. The possibility 
that this procedure introduced bias into the statistical ana- 
lyses was assessed by comparing univariable ORs for the 
predictor variables, computed both before and after the 
data exclusion (Tables 1 and 2). 

Farm-level predictor variables from the PND were 
selected for inclusion in the statistical analysis if they 
were available from the GBPR, and if the proportion of 
missing observations was not >50% (Table 3). Total 
farm-level bird count ranged from 2,700 birds - 512,000 
birds (median = 77,850 and 48,900 for scenario 1 and 2 
data subsets, respectively), and total farm-level poultry 
house count ranged from 1-4 houses (median = 3 for 
both scenario 1 and 2 analysis data subsets). Numeric 
(bird count and house count) and management type 
(indoor and free-range) variables were each grouped 
into binary small or large and yes or no categories 
respectively, then re-categorised into their cross-classifi- 
cations (Table 4). This re-grouping was carried out in 
order to take account of collinearity (assessed by Pear- 
son's product-moment correlation coefficients > 0.25) 
without losing information through the exclusion of pre- 
dictor variables. Furthermore, categorising the numeric 
variables was useful for interpretation purposes, as the 
objective was to characterise farms into types based on 
their demographic profile. 



Data clustering 

Due to the complexity of clustering within the PND, 
multilevel multivariable logistic regression was initially 
used to control for the data dependency between farms 
affiliated with integrated companies. However, these 
models were unstable; three farms with particularly 
large model residual values had a great influence on sce- 
nario 1 model coefficients (ifNY predictor variable was 
particularly unstable). Despite the instability of the mul- 
tilevel models, in the subsequent analyses comparing the 
geographical distribution of between-farm association 
frequency using the PND with that following extrapola- 
tion to the GBPR, they gave qualitatively similar results 
(not shown). Single-level multivariable logistic regression 
was therefore considered sufficient. 

Statistical modelling 

All statistical analyses were conducted using R v2.92 
[57], and models were developed using the glm and 
glmer functions for single-level and multilevel models 
respectively (for glmer see lme4 package [58]). All pre- 
dictors whose coefficients from univariable analyses 
were associated (p-value <0.25) were included in the 
multivariable models [59]. Model building was carried 
out manually using a backward reduction method and 
all potential 2-way interactions were explored between 
predictors of the most parsimonious model. Model 
selection was based on the AICc value; a second-order 
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variant of the Akaike Information Criterion [60]. See 
equations 1 and 2 for the form of the final models cor- 
responding to scenarios 1 and 2 respectively. 

logit Pr(L af/I ) = p 0 + PihbLSi + fohbSLi 

+/3 3 hbLLi + faifNNi + p 5 ifNYi + fcifYYi 

+fi 7 Integratedi + figEasti + foScotlandi (1) 

+PioWaleSi + pnWesti + Pi 2 ifNNi * Integratedi 

+PisifNYi * Integratedi + fiuifYYi * Integratedi 

logit Pr(M af/I ) = p 0 + PihbLSi + fohbSU 

+p 3 hbLLi + ft integratedi + PsEasti (2) 

+p 6 Scotlandi + p 7 WaleS{ + PgWestf 

The model fit and predictive ability were determined 
by plotting Studentized residuals and leverage values 
against the predicted probabilities for each covariate pat- 
tern [59], and by obtaining the area under the ROC 
(Receiver Operator Characteristic) curve for a range of 
model sensitivities and specificities. The impact on the 
model coefficients of removing the three most influen- 
tial data points, as assessed by their Cook's statistic [59], 
was determined to not have a substantial influence on 
the model outputs (results not shown). 

Extrapolating network data to the GBPR 

Predicted probabilities of a small (pp s ), medium (pp m ) 
and large (ppj) between-farm association frequency were 
obtained for each farm (denoted as i) recorded in the 
GBPR that had no missing data for the corresponding 
predictor variables (n = 3009). This extrapolation was 
carried out using a logistic transformation of the linear 
predictors; coefficients were obtained from the models 
fitted to the PND, and predictor values were substituted 
using predictor variable information informed by the 
GBPR. As large between-farm association frequencies 
were associated only with a single SH, farms in the 
GBPR that matched this profile (high pp t value) were 
considered similar to each other but 'unclassified' with 
regards to their between-farm association frequency 
(though for convenience are referred to as L af ). 

Comparative analysis of geographical variation 

For the purpose of comparing the geographical variabil- 
ity between the PND and GBPR, the probability of each 
GBPR farm having a S a f, M a f and L af was calculated 
from the fitted predicted probabilities (see section on 
'Extrapolating network data to the GBPR'). These were 
summarised on a county- average level and compared to 
the county- average prevalence of observed S af , M af and 
L a f taken directly from the PND (using all the data for 
which full contact information was known, n = 662) 



using ArcGIS v.9.2 (ArcView®, ESRI, Redlands, CA, 
USA). 

In order to assess at a regional-level the significance of 
the observed geographical pattern following the extrapo- 
lation to the GBPR, 95% confidence intervals were sto- 
chastically generated by randomly allocating each farm 
to a S af , M af or L af group based on their fitted predicted 
probabilities. This process was repeated for 1000 itera- 
tions of randomly allocating farms to a group, enabling 
the quantification of 2.5% and 97.5% quantiles of the 
probabilities of S a f, M a f and L a f per region, thus repre- 
senting the lower and upper bounds of the 95% CIs, 
respectively (Figures 3a and 3b). 
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